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Abstract 

The target measure \x is the distribution of a random vector in a box 
£>, a Cartesian product of bounded intervals. The Gibbs sampler 
is a Markov chain with invariant measure [i. A "coupling from the 
past" construction of the Gibbs sampler is used to show ergodicity 
of the dynamics and to perfectly simulate fi. An algorithm to 
sample vectors with multinormal distribution truncated to B is 
then implemented. 
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1. Introduction 

The use of latent variables has several interesting applications in Statistics, Econometrics 
and related fields like quantitative marketing. Models like tobit, and probit, ordered probit 
and multinomial probit are good examples. References and examples could be found 
in Geweke, Keane and Runkle (1997) and Allenby and Rossi (1999), especially for the 
multinomial probit. 

When the estimation process uses some simulation technique, in particular in Bayesian 
analysis, the need of drawing a sample from the distribution of the latent variable naturally 
arises. This procedure augments the observed data Y with a new variable X which will 
be referred as a latent variable. It is usually referred to as data augmentation (Tanner 
and Wong (1987), Tanner (1991), Gelfand and Smith (1990)) and in other contexts as 
Imputation (Rubin 1987). 

The data augmentation algorithm is used when the likelihood function or posterior 
distribution of the parameter given the latent data X is simpler than the posterior given 
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the original observed data Y. If the distribution of the variables X is a multivariate normal 
in d dimensions and Y = 1{X £ B} is the indicator function of the set B, the drawing is 
made from the normal distribution truncated to B (and/or B c ). 

The goal of this paper is to produce exact (or perfect) samples from random variables 
with distributions supported on a <i-dimensional box B\ we call box the Cartesian product 
of d bounded intervals. We construct a discrete-time stationary Markov process (Q, t £ Z) 
in the state space B whose time-marginal distribution at any time t (that is, the law of Q) 
has a given density distribution g with support in B. The construction is then implemented 
to perfectly simulate normal vectors of reasonable large dimension truncated to bounded 
boxes. The approach is also useful to show uniqueness of the invariant measure for a family 
of processes in a infinite-dimensional space [a,b] zd with truncated Gaussian distributions 
and nearest-neighbor interactions (in preparation, see also Section EJ). 

The process is the Gibbs sampler popularized by Geman and Geman (1985) and used 
for the truncated normal case by Geweke (1991) and Robert (1995) in a Markov Chain 
Monte Carlo (MCMC) algorithm to obtain samples with distribution approximating g. To 
describe the process in our setting, let d > 2 and take a box B — B(l) x ■ • • x B(d) where 
B{k) are bounded intervals: B(k) = [a k , b k ] C R, k — 1, . . . , d. If x = (x±, . . . , xj) is the 
configuration at time t — 1, then at time t a site n(t) is chosen uniformly in {1, . . . , d}, say 
n(t) = k and x k is substituted by a value y chosen with the density g conditioned to the 
values of the other coordinates; that is with the density g k (-\x) : B{k) — > IR + given by 

g k (y\x):= 9 ^ y £ B(k) , (1) 

where k = y = (x 1 , . . . , x k _ u y, x k+1 , . . . , x d ), z = (x 1 , . . . , x fc _i, z, x k+1 , ...,x d ). Note 
that g k (y\x) does not depend on x k . The distribution in B with density g is reversible 
measure for this dynamics. 

We construct a stationary Gibbs sampler t £ Z) as a function of a sequence U_ = 
(U~t,t £ Z) of independent identically distributed and uniform in [0,1] random variables 
and the updating schedule k = (K,(t),t £ Z). We define (backwards) stopping times 
r(t) £ (— oo,t] such that {r(t) = s} is measurable with respect to the field generated by 
the uniform random variables and the schedule in [s + l,t]. The configuration at time t 
depends on the uniform random variables and the schedule in [r(£) + l,t]. 

In fact we construct simultaneously processes (Cf 8t u — oo <s<t<oo, C & B). For each 
fixed s and (, (Q s t -,, s < t < oo) is the Gibbs sampler starting at time s with configuration 
£. For each fixed t, ((f st ], s <t, ( E B) is a. maximal coupling (see Thorisson (2000) and 
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Section [2]). For each fixed time interval [s,t] the process {Q st n, s < t' < t) is a function of 
the uniform random variables and the schedule in [s + l,t] and the initial (. The crucial 
property of the coupling is that t does not depend on (. 

The construction is a particular implementation of the Coupling From The Past ( CFTP) 
algorithm of Propp and Wilson (1996) to obtain samples of a law v as a function of a finite 
(but random) number of uniform random variables. For an annotated bibliography on the 
subject see the web page of Wilson (1998). A Markov process having v as unique invariant 
measure is constructed as a function of the uniform random variables. The processes 
starting at all possible initial states run from negative time s to time using the uniform 
random variables U s , . . . ,Uq and a fixed updating schedule. If at time all the realizations 
coincide, then this common state has distribution v. If they do not coincide then start the 
algorithm at time s — 1 using the random variables U s -i, . . . , Uq and continue this way up 
to the moment all realizations coincide at time 0; r(0) is the maximal s < such that this 
holds. The difficulty is that unless the dynamics is monotone, one has to effectively couple 
all (non countable) initial states. Murdoch and Green (1998) proposed various procedures 
to transform the infinite set of initial states in a more tractable finite subset. 

In the normal case, for some intervals and covariance matrices, the maximal coupling 
works without further treatment. Moeller (1999) studies the continuous state spaces when 
the process undergoes some kind of monotonicity; this is not the case here unless all 
correlations are non negative. Philippe and Robert (2003) propose an algorithm to perfectly 
simulate normal vectors conditioned to be positive using coupling from the past and slice 
sampling; the method is efficient in low dimensions. 

Consider a c/-dimensional random vector Y e M. d with normal distribution of density 



where the covariance matrix X is a positive definite matrix, \i is the mean vector and 



In this case we say that Y ~ N{ii, X). We denote X the truncation of Y to the box £>; its 
density is 




(2) 




(3) 




(4) 



where Z(B) = P[X G B] — j B /(x)<ix is the normalizing constant. 
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We have tested the algorithm in some examples. In particular for the matrix S" 1 = 
|/ + where I is the identity matrix and V = (1, . . . , 1). For the box [0, l/2] d and for 
d > 14 our method is faster than the rejection method. The simulations indicate that the 
computer time for the rejection method grows exponentially with d while it grows like d 4 
for our method. Using a simple program in Matlab in desktop microcomputer the method 
permits to simulate up to dimension d = 30 for the interval [0, l/2] d . For bigger boxes or 
boxes apart from the mean, for instance [1/2, l] d , our method is less efficient but in any 
case it is much faster than the rejection one. Since our method uses several times the same 
uniform variables, to compare the computation time we have counted the number of times 
each of the one-dimensional uniform random variable is used. It is expected that if the box 
B is sufficiently small and the covariance matrix has a "neighbor structure", the number 
of steps may grow as (dlnd) 2 . In this case one should be able to simulate the state of an 
infinite dimensional Gaussian field in a finite box. 

Correlated normal vectors can be mapped into independent standard normal vectors 
using the Cholesky transformation (see Chapter XI in Devroye (1986)). In the case of 
truncated normals to a box, the transformation maps the box into a set. One might 
simulate standard normals and reject if they do not belong to the transformed box. 
The difficulties are similar to those of the rejection method: as the dimension grows, 
the transformed set has small probability and the expected number of iterations grows 
exponentially with the dimension. 

When the truncating set is not a box, our method generally fails as the coupling event 
for each coordinate has probability zero in general. We illustrate this with an example in 
Section [6J 

Another possibility, also discussed by Devroye (1986) is to compute the marginal of 
the first coordinate, then the second marginal conditioned to the first one and so on. 
The problem here is that the computation of the marginals conditioned to all the other 
coordinates be in B may be as complicated as to compute the whole truncated vector. 

In Section [2] we define coupling and maximal coupling. In Section [3] we describe the 
stationary theoretical construction of the Gibbs sampler and the properties of the coupling. 
Section H] is devoted to the pseudocode of the perfect simulation algorithm. In Section [5] 
we compute the functions entering the algorithm for the truncated normal case. In Section 
[6] we give some examples and compare our perfect simulation algorithm with the rejection 
one based on the uniform distribution. 
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2. Coupling 

A coupling of a family of random variables (X x , A G A) with A a label set, is a family of 
random variables (X x : A G A) with the same marginals; that is, such that 

X x £ X x , AG A. 
An event C is called coupling event, if Xa are identical in C . That is, 

C C {X x = X y for all A, A' G A}. 
We consider continuous random variables X x in M. with densities f x satisfying 

P(C)< [ M f x (y)dy. (5) 

for any coupling event C. This is always true if A is countable and in the normal case 
treated here. When there is a coupling event C such that identity holds in (|3j) the coupling 
is called maximal. See Thorisson (2000) for a complete treatment of coupling, including 
historical quotations. 

A natural maximal coupling of these variables is the following. Let G x (x) = g x (y)dy 
be the corresponding cumulative distribution functions and define 

R(x) := / inf g x (y)dy; R := R(oo) . (6) 

Let G\{x) = G x (x) — R(x) + R. Let U be a random variable uniformly distributed in [0, 1] 
and define 

X x = R~\U)1{U <R} + G~\U)1{U > R} , (7) 
where the generalized inverse of G, is defined by 

G(x)>y^x>G- 1 (y). (8) 

The marginal distribution of X x is G x : 

¥(X X < x) = F(U < R(x)) +F(R<U < G x (x)) = R(x) + G x (x) - R = G x (x) . 

The process (X x , A G A) is a maximal coupling for the family (X x , A G A). We use this 
coupling to construct the dynamics starting from different initial conditions. Then we show 
how to compute R(x) and R in the normal case. 
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In the sequel we use a family of couplings. Let (A^, t > 1) be a decreasing sequence of 
parameter sets: A^ +1 C A^ for all £>1. Let R[o](x) = and for £ > 1, 

/X 
inf 9\{y)dy\ R w := %(oo) . (9) 

Assume lim^oo i?^ = 1. Define = -R[£](#) — -R[£_i](x) + R\t-i] and for A G A^\ A^ +i 

define Ga,[£](x) = Ga(x) — R[£]( x ) + an d 

i 

Xx = Y,D^(U)l{U€ [Ri-uRi)} + G- x l e] (U)l{U>R [e] }. (10) 

i=l 

Then (Xa, A G A^) is a maximal coupling of (X\, A G A^) for each I. 



3. A stationary construction of Gibbs sampler 

The goal is to define a stationary process f 6 Z) in B with the Gibbs sampler 
dynamics associated to a density g in i3. The time marginal of the process at any time will 
be the distribution with the density g. The Gibbs sampler is defined as follows. Assume 
Qt-i G B is known, then at time t choose a random site n(t) uniformly in {1, ...,d} 
independently of "everything else". Then set Ct{k) = Ct-i(k) if k ^ n{t) and for k = n(t) 
choose (t{k) with the law g conditioned to the values of ( t -i at the other coordinates: 

F{( t (k)eA\( s , s<t)= f A g k {y\{t-i)dy if k = «(t) , (11) 

where gi~(y\x), y G B{k) := is the conditional density of the fcth component of g 

given the other coordinates given in ([T]). Reversibility follows from the identity 

/ / H 1 (y)H 2 (x)gk(y\^)dyg(x)dx= / / H 1 (x)H 2 (y) ^(y|x) dyg(x)dx 

JB J B(k) JB J B{k) 

for any H\ and if 2 continuous functions from B to R, where x = (xi, . . . ,2^) and y = 
(sci, . . . , Xk-i, y, Xk+i, ■ ■ ■ , Xd). In particular the law with density g is invariant for the 
Gibbs sampler. It is indeed the unique invariant measure for the dynamics, as we show in 
Theorem [TJ 

To define a stationary version of the process we construct a family of couplings for each 
time interval [s,t]. The coupling starts with all possible configurations at time s and uses 
random variables associated to each time to decide the updating. All couplings use the 
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same updating variables. To reflect the evolution of all possible configurations we introduce 
a process on the set of boxes contained in B. The initial configuration of this process is B 
at time s but at times s' > s can be (and will be as s decreases) reduced to a point. Let £ 
be a ci-tuple of closed bounded intervals of R, £(z) C B(i), i — 1, . . . , d. We abuse notation 
calling £ also the box 

£ = £(l)x-..x£(d)cB. 
Notice that £ has dimension less than d when £(z) is just a point for some i. Define 

R k (x\0 = / (mmg k (y\x))dy, x e B(k) (12) 

i2*(0 = (13) 

These functions depend on £ only through « ^ -Rfc(£) is the probability that if the 
fc-th coordinate is updated when the other coordinates belong to the set £, then a coupled 
event is attained for the k coordinate. This means that for any configuration x G £, the 
updated value of the fc-th coordinate will be the same, say x; its law is given by Rk(x\£). 
In this case the updated interval is also reduced to the point x. If the coupled event 
is not attained £(k) is kept as the interval B(k). 

The updating random variables consist on two families: U_ = (U t : t G Z), a family of 
independent variables with uniform distribution in [0,1] and : t G Z), a family of 

independent variables with uniform distribution in {1, . . . , d} and independent of U_. 

Now for each s G Z we define a process (//[ S) t], t>s) taking values on boxes contained in 
B as function of ((Ut, n(t)), t > s). The initial state is 7][ SlS ] = B and later each coordinate 
k is either a (random) point or the full interval B(k). More precisely, for s G Z and x G R 

set 

= 0, (a;) =0, = 0, rj[s,s] = B. (14) 

Fix n > 1 and assume i2u it i(x), -R[ s ,ti and 7yr S) t] are defined if < t — s < n — 1. Then for 
t — s = n and x G B(n(t)), define 

-Rm = -R K (t)(?7[ s ,t-i]) (15) 

i?[a,t](x) = R K (t)(x\V[s,t-l]), (16) 

D [Sjt ](x) = R[ s +i,t] + R[s,t](x) - R[ s+ i jt ](x) 

rj[ s>t _i](k) , if k^K(t), 

V[s '^ k) = ^ V[s+x,t]( k ) HUt < R [s+ i,t]} + D-] t] (U t ) l{R [s +x,t] <U< R M } ■ (17) 

+ B(k)l{U t > R { s, t ]}, iik = K(t); 
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In words: k = n(t) defines the coordinate to be updated at time t. R[ s ,t] is the probability 
that the coupling event is attained at coordinate n(t) for all the processes starting at times 
smaller than or equal to s. The coupling event is attained for the process starting at s 
when U t < R[ 8 ,t] ■ m case the coupling event is attained, the value of coordinate k is given 
by D7,MJ t ), for s' given by the biggest s < t such that U t < R[ s ,t] (second term in (fI7jl ). 
This value is the same for each s < s' (first term in ( TT7T) ). When the coupling event is not 
attained (that is, for s > s'), the coordinate k is kept equal to the full interval B{k) (third 
term in (TTTT) ) . 

It follows from this construction that (R[t-i,t](-) , R[t-x,t]i ^[t-i,t]) does not depend on U_ 
and, for s < t — 1, 

(R[ St t\(-), R[ s ,t], D [Sjt] ) is a function of ((U n , re(ra)), n = s + 1, . . . , t - 1). (18) 
77[ S)t ] is a function of (R[ 8 ,t](-), R[ s ,t], D[s,t]) and {U u n(t)) . (19) 

The kth coordinate of r)u )t -\ is either the interval B(k) or a point. The process is monotonous 
in the following sense: 

V[s-i,t] C r][ Sj t\ for all s < t . (20) 

In particular, if some coordinate is a point at time t for the process starting at s, then it 
will be the same point for the processes starting at previous times: 

If r][ S)t ](k) is a point, then r] [s _ ljt] (k) = r] [S)t] (k), for all s < t . (21) 

For each time t define 

r(£) = max{s < t : n St t\(k) is a point for all k} . (22) 

Using 018p we conclude that r(t) + 1 is a stopping time for the filtration generated by 
({Ut-m K(t — n)),n > 0): the event {r(t) = s} is a function of ((U n , k(ti)), n — s + 1, . . . , t). 

Assume r(t) > — oo almost surely for all t and define the process (Q, t £ Z) in i3 by 

C* :=%(*>,*] GB. (23) 

Noticing that r(t) > — oo is equivalent to R\ n ,t] 1 as n ^ — oo, we get the following 
explicit expression for ( t ~- 

( Ct-i(k), \ik^K{t), 
Ct(k) = { . (24) 

I E„< t -i D-^Ut) l{R [n+1 , t] <U t < R [nA } , if k = 
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For each s < t and ( G B we now construct the process (Cf st i, t > s), the Gibbs sampler 
starting with ( at time s. Let Gk(x\x) = gk(z\x)dz and 

G M (x\C) = R [s , t ] + G K{t) (x\0 - R [s , t] (x) . (25) 
For each ( G B define Q s s i = C and f° r t > s, 



El= s G i2 M ]> ^l,^) (26) 

+ i{f/ t > R^G^mci^) , if fc = K(t) 



For each fixed t and k, ((Cf 8 t ](k), ( G B), s < t) is a maximal coupling among the processes 
starting with all possible configurations ( at all times s < t. 

Theorem 1. Assume r(£) > —oo almost surely for all t 6 Z. Then the process ((t, t G Z) 
defined in (Efy is a stationary Gibbs sampler related to g. The distribution of ( t at each 
time t is the distribution with density g in B. This distribution is the unique invariant 
measure for the process and 



C 



supP(C^Ct)<P(T(*)<s), (27) 



where tf st -, is the process starting with configuration £ at time s. 

Proof. Stationarity follows from the construction. Indeed, for all s < t and i G Z, ?7[ Sjt ] as a 
function of ((£/„, n(n)),n G Z) is identical to r)[ s+ gt+£\ as a function of ((U n +e, n(n+£)),n G 
Z). 

The function % : [0, 1] x [0, _> defined in ([24]) by 

tf t (?7; U t - U U t - 2 , ...)= D^ t] {U) l{R [n+1 , t] <U< R [n>t] } (28) 

n<t-l 

satisfies for = 

P(* t {U;U t - U U t - 2 ,...)eA)= [ g k (y\Ct-i)dy, (29) 

where we used that ( t -i is function of U t -i, U t -2, ■ ■ ■ and «t_i, k 4 _ 2 , .... This is sufficient 
to show that evolves according to the Gibbs sampler. 
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To finish we need to show that the distribution with density g is the marginal law of (f 
Since the updating is performed with the conditioned distribution, the distribution with 
density g is invariant for the Gibbs sampler. 

Since Q st , = C[ s ,t] if s < r(t), then (T2T1) follows. This also implies that lim^-fx, Q s f , = ( t 

almost surely. In particular Q,^ converges weakly to the law of Q as s —> — oo. The same 

is true for lim^oo t y Taking £ random with law g, this proves £ t must have law g. Then 
taking ( with law g invariant for the dynamics, we conclude g = g and the uniqueness of 
the invariant measure follows. □ 

Lemma 1. A sufficient condition for r{t) > — oo a.s. is that Rk{B) > for all k. In this 
case F(r(t) < s) decays exponentially fast with t — s. 

Proof. r(t) > T°(t) := max{s < t — d : k(s + n) = n and U s+n < R K ( s+n )(r]),n = 
1, . . . ,d}, the last time in the past the coordinates have been successively updated to a 
point independently of the other coordinates. At T°(t)+d and successive times f > T°(t)+d, 
each coordinate of r]^ T o^^ has been reduced to a point. r (t) is finite for all t because the 
event {k(s + n) = n and U s+n < R K ( s+n )(rj) , n — 1, . . . , d} occurs for infinitely many s with 
probability one. The same argument shows the exponential decay of the tail of r(t). □ 

Remark The velocity of convergence of the Gibbs sampler to equilibrium depends on the 
values (Rk(B), k = 1, . . . , d). In turn, these values depend on the size of the box B and the 
correlations of the distribution g. Strongly correlated vectors produce small values Rk(B) 
and hence slow convergence to equilibrium. The efficiency of the algorithms discussed in 
the next sections will also depend on these values. 

4. Perfect simulation algorithm 

The construction of Section [3] is implemented in a perfect simulation algorithm. Let £ 
be a box contained in B. For each k = 1, . . . , d, let Rk(-\Q and Rk(0 = Rk(bk\0 as in 
f|T2|) and ffl~3l) . When £ = B we denote by R^ the value of Rk{B). Let K{t) be the updating 
schedule. It can be either a family of iid chosen uniformly in {1, . . . , d} or the periodic 
sequence nit) — [t — l] mo d a + 1- For each pair r], £ of boxes contained in B and 1 < k < d, 
let Dk,n£ '■ B{k) — > [0, 1] be the function defined by 

D k ^(x) = R k {x\C) - R k (x\v) + Rk(v) {x e B(k)) . 
Let £i,^2 be boxes contained in B, u e [0,1], and 1 < k < d. The auxiliary coupler 
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function <\> is defined by 
function 0(£i, £2, u i k) '■ 

£(*) <-&(*) 

if ^(fc) is not a point in R and w< 
update the fcth coordinate of £: 

end 

Return (^ 2 ) 



Perfect Simulation Algorithm: Perform the following steps 

T <— 
start (?7o) 

while 7] is not a point in B 

T <— T — 1 
start (777O 
i<- T 

while t < update rft+i using the coupler function 

Vt+i *- <t>(Vu m+i, U t +i, «(t + 1)) 
t <- t+1 

end 

end 

Return (r/o) 

Where, for each T < 0, "start (77T)" is the following subroutine 

start (.t)t) '■ 
t)t <- B 

Generate C/ r a uniform random variable in [0, 1] 

if Ut < Rk(T) > compute the «(T)th coordinate of t]t'- 

Tft.(*(T))<- i^(Wrl^) 

end 
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5. Truncated Normal distributions 

In this section we implement the construction in the normal case. We start with 
elementary facts of the one dimensional Normal distribution. 



One dimension Let 4>{x) = x2 l 2 be the standard normal density in K. and $( 



■T) 



(f>(y)dy the corresponding distribution function. Let a < b be real numbers. A random 
variable X with mean fi and variance a 2 has truncated normal distribution to the interval 
[a, b] if its density is given by 

q a h(x: u, a 2 ) = r - , a < x < b, (30) 

yaM >a*> ) a ^b- R -j _ - - ' V J 

where \i G IR and o > 0. This distribution is called T a h Af(n, a 2 ). The truncated normal is 
the law Af(fi, a 2 ) conditioned to [a, b]. Let 

A a>b (fj,) = $(b- t j,)-$(a-tJ,). (31) 

Let Xy, fi G I = be a family of normal distributions truncated to the interval [a, b\. 

X,j, ~ T a ^N{ii., o" 2 ). Let t4 ± = and x(I, a) = x a ^(I, a) be defined by 



and 

2 



Let 7 = |ju , and 



= ^-^lnf4x)- (33) 



(34) 



i2(x|J) = / inf g atb (y;fi,a 2 )dy, xe[a,b] 

Ja ^ 



R{I) = R{b\I). (35) 

The next proposition, proven later, gives an explicit way of computing the functions 
participating in the multicoupling. It says that the infimum of normal densities with 
same variances a and means in the interval I = coincides with the normal with 

mean \x~ up to x(I, a) and with the normal with mean /i + from this point on. 
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Proposition 1. The integrand in (34\ ) has the following explicit expression 

inf g a> b(x, /z, a 2 ) = g a ,b( x 'i M~> °" 2 )1{^(^) °") < ^} + £0,6(2?; o" 2 )l{^(-^, cr) > x}. (36) 
As a consequence, 



R(x\I) 



1 



x — jl^ 
a 



a — ji 
a 



+ 



1 
1 

I 1 



x(/,<t)-/z 



l{a; < cr)} 
a — fi 



a 



X — 11 



a 



x(/,cr) - /T 



Jn particular, 



x(J, a) — /i H 



a 



a — [T 



a 



G 



1{S > 

l{x > z(i»}. 



b — fi 



a 



cr) — fx 



a 



and for each u G [0, -R(/)] there exists a unique x u G [a, 6] suc/i £/ia£ R(x u \I) = u. 

Remark < R(I) < 1. Indeed, R(I) > because inf^j g a ,b{x\ cr 2 ) > 0. If R(I) = 1, 
then all densities coincide unless /i~ = by hypothesis this trivial case is excluded. 

Multivariate Normal Let Y ~ A/"(/i, S), X = Y|# be the vector conditioned to B and 
x G <B. The law of X fc conditioned to (JQ, z 7^ fc) = (x;, z 7^ fc) is the truncated normal 
T ak , bk N{fi k {*),cr 2 ), where 

. .0 1 



-E 



aki\%i - V>i), cr k 



(37) 



We see that /Zjt(x) does not depend on x^ and 

fa(x) e K( x )^fe( x )] • 

where 



= — 2^ I aA -' i<3i + ^2 akibi 

i=l \aj.j<0,j^fc aj.j>0,i^fc 



— V ( V 

i=l \aj.j<0,« 
1 d if 

— ^2 afc ^ — I akihi + ^2 

i=l \&ki <0,i^=k ifcz^O)* 
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Remark Since Rk{B) > for all k, r(t) > — oo almost surely for all t G Z. This implies 
that the truncated multivariate normal case falls under Theorem's [T] hypothesis and our 
algorithm works. 



Back to one dimension. Minimum of truncated normals We prove Proposition [IJ 
To simplify notation write g a ,b(x] n) instead of g a ,b(x] 1). Observing that 

, Q\ 1 / X Li 

g ab {x;ii,a ) = -g* b [-, - 

it is sufficient to prove (136|) for a = 1. The proof is based in the following elementary 
lemmas. 

Lemma 2. Let fix < Then 

min(5f aib (x;/!i),5f aife (x;/i 2 )) = g a ,b(x; /j,i)1{x(ji u li 2 ) < x} 

+ g a ,b( x 'i A t 2)l{a;(/ii, /i 2 ) > x}, (38) 



xfaulto) = - In ( 444 ) • (39) 



where 

x{lh,li 2 ) = — 
Lemma 3. For all \x G (fi~ , fi + ) it holds 



//+-// 2 /!-/! V ^a,b(A*) 



Proof of Proposition [JJ Let \x G (/i ,/i + ). By Lemma [2] we have 

0a,*fo/O<*tote/O ^ - -J— ln f < s, (41) 

g a ,b{x^) < 9aM» + ) * — ^ — In (-r^r-^r] < x . (42) 

2 \A a>b (fj, + )J 

Hence g a ,b{x\ ij) < min(g a>b (x; fj,~), g ajb (x; if and only if 

1 ln /A^\ ^ L_ inf ^1)1 . (43) 

L 2 v^(/i + ); 2 V^WM/J 1 ; 
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But by Lemma [3] the interval in (j4"3l is empty for all \i G (/i , This implies that 
min(g aib (x; g a>b (x; fj, + )) < g a>b (x; fj,), from where 



holds for cr = l. The corresponding identity (136]) follows by applying Lemma [2] to ji\ = /i 



In this section we compare our method with the rejection method in some examples. 
The conclusion is that the rejection method may be better than ours in dimension 2 for 
some regions but ours becomes better and better as dimension increases. Then we show 
why the method does not work when the region is not a box. This discards the following 
tempting approach: multiply the target vector by a matrix to obtain a vector with iid 
coordinates. The transformed vector is much easier to simulate; the problem is that it is 
now conditioned to a transformed region. When the region is not a box, the conditioned 
vector is not an iid vector and a coupling must be performed. We show here that in general 
the corresponding coupling event has probability zero. 

The rejection method We compare our algorithm with the following rejection algo- 
rithm: simulate a uniformly distributed vector (x, u) in B x [0, max xe g g(x)]. If u < g(x), 
then accept x. We use d + 1 one-dimensional uniform random variables for each attempt 
of the rejection algorithm. Let N be the expected number of one-dimensional uniform 
random variables generated by our perfect simulation algorithm, each counted the number 
of times that it is used. Let M be the expected number of one-dimensional uniform random 
variables needed by the rejection algorithm to produce an accepted value. We divide the 
experiments in two parts: d = 2 and d > 2. 



B{x 1 ,x 2 ) = [0,1] x [0,1] + (x 1 ,x 2 ), -4 < x x < 4 and < x 2 < 4. (ii) boxes B(r) = 
[0,r] x [0,r] (type 1) and B(r) = [-r, 0] x [0,r] (type 2), r = 1/2, 1,2,3,4,5,6. The results 
are shown in Tabled] for (i) and in Figured] for (ii). 




and \x 2 = 



□ 



6. Comparisons and comments on other methods 
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X 2 \ X\ 


-4 


-3 


-2 


-1 





1 


2 


3 


4 





65 


37 


16 


5 


4 


12 


29 


55 


86 


1 


76 


45 


22 


8 


4 


8 


22 


45 


75 


2 


88 


55 


28 


11 


4 


6 


17 


37 


64 


3 


100 


65 


36 


15 


5 


5 


12 


30 


54 


4 


112 


77 


45 


21 


7 


4 


9 


23 


45 



Table 1: d = 2 case (i): values of M{x\, x 2 ), mean number of one-dimensional uniform random 
variables used by the rejection method. The corresponding number for our method, N(xi,x 2 ) £ 
[3.2, 3.3] is better for all boxes. 




(a) (ii) type 1. (b) (ii) type 2. 



Figure 1: d = 2 case (ii): (r,N(r)) (in solid line) and (r,M(r)) (in dashed line) for boxes of 
both types. For type 2 our method is faster than the rejection method, but for type 1 this is true 
when r < 4. 
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[d > 2] example 1. Fix ji = (0, ... ,0) and S -1 = |7 + where I is the identity 

matrix and 1' = (1, . . . , 1). We consider two cases: (i) boxes B = [0, d = 2, . . . , 29 and 

(ii) boxes B= [§, 1] , d = 2, . . . , 14. The results are shown in Figure [2] and Table [2] for (i) 
and (ii), respectively. 




(a) (b) (c) 

Figure 2: Case (ii) d > 2. (a) (d,iV(d)); (b) (d, iV(d)/d 4 ); (c) (d,N(d)) (solid line) and (d,M(d)) 
(dotted line). It seems that iV(d) grows as o(c? 4 ) and M(d) as (d + l)e ad , for some a > 0. For 
d > 14 our method is faster than the rejection method. 



d 


2 


3 


4 


5 6 


7 


8 


9 


10 


11 


12 


13 


14 


N 


3 


6 


11 


17 27 


36 


48 


60 


84 


106 


147 


164 


208 


M 


5 


10 


22 


51 133 


372 


1141 


3810 


13736 


53356 


223520 


981460 


4858138 



Table 2: Case (i) d > 2. N is the mean number of uniforms used by our method and M by the 
rejection method. 

[d > 2] example 2. Let \x = (0, . . . , 0) and X -1 be the matrix, with "neighbor structure", 
= pl J_i ll{|j — z| < 1}, with p = |. We consider boxes of the form B = [0, l] d . The 
results are shown in Figure O 

Strongly correlated variables The algorithm slows down very fast with the dimension 
when the variables are strongly correlated. An Associate Editor proposes to consider a 
Gaussian vector truncated to the box I d = [0, l] d , with mean \x = and covariance matrix 



S = el+ (1 -e)ll' 



(44) 
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(a) (b) (c) 

Figure 3: (a) (d,N(d)); (b) (d, N(d)/(d\og d) 2 ); (c) (d,N(d)) (in solid line) and (d,M(dj) (in 
dotted line). It seems that N(d) grows as (dlogd) 2 and M(d) as (d + l)e ad , for some a > 0. Our 
method is faster than the rejection method for all d. 

where I is the identity matrix and 1' = (1, . . . , 1), for small values of e. As a measure 
of the speed of the algorithm we have computed the coefficient R, shown in Table [3], as 
function of e and the dimension d. In d = 2 the coupling time r(0) for this example is a 



e\d 


2 


4 


8 


16 


32 


0.1 


0.5139 


0.3446 


0.2792 


0.2507 


0.2375 


0.01 


0.8753 • 10" 3 


0.3121 • 10~ 4 


0.5969- 10" 5 


0.2615 ■ 10~ 5 


0.1731 • 10" 5 



Table 3: Coefficient R for [i = 0, £1 = el + (1 — e)W! where I is the identity matrix and 
1' = (1, . . . , 1), and the box is [0, l] d . 

geometric with mean 1/R, because a it suffices that one of the uniforms is smaller than R 
to get the coupling in the next step. In general, as explained in Lemma [IJ r(0) < t°(0), 
whose expectation is of the order (l/i?) d_1 . 

Other regions Let Y ~ A/"(0, X) be a <i-dimensional multinormal random vector with 
zero mean and nonsingular covariance matrix S. Let B = [a\, bi] x • • -x [ad, bd]- If we rotate 
and scale the coordinates to obtain iid A/"(0, 1)'s, i.e., if we put X = S _1 / 2 Y, then the box 
constraints are of the form a < E 1 / 2 x < b, where a = (ax, ... , ad) and b = (6 1; . . . , 6 d ). 
When the constraints are of this form, the conditional distribution of Xfc|X fc = x fe has 
a probability density function whose support depends on the condition x fe . In such case 
the maximal coupling in our approach may have a coupling event with zero probability. 
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Indeed, it could happen that 

inf g Xk \x= x (xk) = 0. 

X 

This implies that R k {x k ) = J"*^ inf x g Xk \x=M d V = for a11 x k- 

To see this in d — 2, observe that the transformation takes the rectangular box into 
a parallelogram T with sides not in general parallel to the coordinate axes. We need to 
simulate the standard bi-dimensional normal X conditioned to T. It is still true that the 
law of X\ conditioned on X2 = y is normal truncated to the slice of the parallelogram 
T y := {(xi, £2) £ T '■ %2 = y}- It is a matter of geometry that there are different y and y' 
such that T y fl T v = 0; this implies the infimum above is zero. 

Take for instance \i — (0, 0), S = ( ^ ^ j and Y conditioned to the box B = 

[0, 1] x [0, 1]. Then ST 1 / 2 = | f ^ ^ \ and the transformed box is a parallelogram T 
with vertices (0,0), (2/3,1/2), (1/3,2/3), (1,1). 

v' 




Figure 4: In a parallelogram region the minimal value inf x gxi\x=x( x i) is zero. The truncated 
distribution conditioned to the second coordinate being y and y' respectively have disjoint 
supports J- y and T v , which are the projections on the horizontal axis of the green and red 
segments, respectively. 
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7. Final remarks 



We have proposed a construction (simulation) of a multidimensional random vector 
conditioned to a bounded box using a version of the coupling from the past algorithm 
of the Gibbs sampler dynamics. In the case of normal distribution, we have given a 
explicit construction taking advantage of the bell shape of the density and the fact that 
the distribution of one coordinate given the others is a truncated normal with variance 
independent of the other coordinates. This allows to define the coupling set C in function 
of the extreme possible values of the mean and to obtain positive probability for each 
coupling set C as soon as the box is bounded. 

One of the consequences of the approach is to show that the Gibbs sampler is ergodic in 
the box if the coupling event has probability uniformly bounded below in each one of the 
coordinates. This corresponds to the condition Rk(B) > for all coordinate k. When the 
uniform random variable used to update site k at some time falls below Rk(B) > 0, this 
coordinate coincides for all processes. 

The method can be used to show ergodicity of the process in infinite volume when 
the inverse of the covariance matrix has a neighbor structure; for instance for tridiagonal 
matrices. As an example, consider the formal density in [a, b} z : 



which corresponds to an infinite volume Gaussian field, each coordinate truncated to [a, b] C 
R; (3 > is a parameter (inverse temperature in Statistical Mechanics). The Gibbs sampler 
can be performed in continuous time and the coupling can be done as in the finite case. 
However the updating of site k depends only on the configurations in sites j such that 
\j — k\ = 1. Each site gets a definite value (and not an interval) when the corresponding 
uniform random variable falls below a certain value R = R(/3, [a, b}) (corresponding to 
Rk{B), but now it is constant in the coordinates). To determine the value of site k at time 
t one explores backwards the process; calling U the random variable used for the last time 
(say t') before t to update site k there are two cases: U < R or U > R. If U < R, we do 
not need to go further back, the value is determined. If U > R, we need to know the values 
of 2d neighbors at time t'. Repeating the argument, we construct a "oriented percolation" 
structure which will eventually finish if R > 1 — ^j. This value is obtained by dominating 
the percolation structure with a branching process which dies out with probability R and 
produces 2d offsprings with probability 1 — R. The value of R depends on the length of 
the interval [a, b] and the strength of the interaction governed by j3. One can imagine that 
for small intervals and small (3 things will work. 
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